Run-Time Parallelization of Irregular DOACROSS Loops
نویسندگان
چکیده
Dependencies between iterations of loop structures cannot always be determined at compile-time because they may depend on input data which is known only at run-time. A prime example is a loop accessing an array where the array indices are themselves functions of another array determined only at run-time. To parallelize such loops, it is necessary to perform a run-time analysis. We describe a new algorithm to perform this analysis. The proposed method handles all types of data dependencies without requiring any special architectural support in the multiprocessor. Our scheme has an inspector which builds the iteration schedule and an executor which uses the schedule to execute the various iterations. This approach does not require any special synchronization operations during the inspector stage and the executor can be implemented with or without synchronization support. It allows overlap among dependent iterations and requires very little inter-processor communication. Furthermore, the schedule formed by the inspector can be reused across loop invocations. Our scheme has consistent performance (i.e., performance does not degrade rapidly with the number of iterations or accesses per iteration) during the inspector stage and ensures good speedup during the executor stage.
منابع مشابه
EXPLORER: Supporting Run-Time Parallelization of DO-ACROSS Loops on General Networks of Workstations
Performing runtime parallelization on general networks of workstations (NOWs) without special hardware or system software supports is very diicult, especially for DOACROSS loops. With the high communication overhead on NOWs, there is hardly any performance gain for runtime parallelization, due to the latter's large amount of messages for dependence detection, data accesses, and computation sche...
متن کاملEffects of Parallelism Degree on Run-Time Parallelization of Loops
Due to the overhead for exploiting and managing parallelism, run-time loop parallelization techniques with the aim of maximizing parallelism may not necessarily lead to the best performance. In this paper, we present two parallelization techniques that exploit different degrees of parallelism for loops with dynamic crossiteration dependences. The DOALL approach exploits iterationlevel paralleli...
متن کاملTime Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences
ÐThis paper presents a time stamp algorithm for runtime parallelization of general DOACROSS loops that have indirect access patterns. The algorithm follows the INSPECTOR/EXECUTOR scheme and exploits parallelism at a fine-grained memory reference level. It features a parallel inspector and improves upon previous algorithms of the same generality by exploiting parallelism among consecutive reads ...
متن کاملA Feasibility Study of Hardware Speculative Parallelization in Snoop-Based Multiprocessors
Run-time parallelization is a technique for par-allelizing programs with data access patterns dif-cult to analyze at compile time. In this paper we examine the hardware implementation of a run-time parallelization scheme, called speculative parallelization, on snoop-based multiproces-sors. The implementation is based on the idea of embedding dependence checking logic into the cache controller o...
متن کاملA Practical Approach to DOACROSS Parallelization
Loops with cross-iteration dependences (DOACROSS loops) often contain significant amounts of parallelism that can potentially be exploited on modern manycore processors. However, most production-strength compilers focus their automatic parallelization efforts on DOALL loops, and consider DOACROSS parallelism to be impractical due to the space inefficiencies and the synchronization overheads of ...
متن کامل